AITopics | dynamic scene

Collaborating Authors

dynamic scene

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MoPo-Fr123121Dyee namsneo4D cuicla GContraus Vidsreolioan PoiSplntats ting

Neural Information Processing SystemsJun-23-2026, 03:29:33 GMT

Novel view synthesis from monocular videos of dynamic scenes with unknown While camera recent poses remains advances a in fundamental 3D representations challenge such in computer as Neural vision Radiance and graphics. Fields (NeRF) scenes, and they 3D struggle Gaussian with Splatting dynamic (3DGS) content ha and ve sho typically wn promising rely on results pre-computed for static camera poses. We present 4D3R, a pose-free dynamic neural rendering framework that Our method decouples first static leverages and dynamic 3D foundational components models through for initial a tw pose o-stage and approach.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Track3R: Joint Point Map and Trajectory Prior for Spatiotemporal 3DUnderstanding

Neural Information Processing SystemsJun-23-2026, 02:30:05 GMT

Understanding the 3D world from 2D monocular videos is a crucial ability for AI. Recently, to tackle this underdetermined task, end-to-end 3D geometry priors have been sought after, such as pre-trained point map models at scale. These models enable robust 3D understanding from casually taken videos, providing accurate object shapes disentangled from uncertain camera parameters. However, they still struggle when affected by object deformation and dynamics, failing to establish consistent correspondence over the frames. Furthermore, their architectures are typically limited to pairwise frame processing, which is insufficient for capturing complex motion dynamics over extended sequences. To address these limitations, we introduce Track3R, a novel framework that integrates a new architecture and task to jointly predict point map and motion trajectories across multiple frames from video input. Specifically, our key idea is modeling two disentangled trajectories for each point: one representing object motion and the other camera poses. This design not only can enable understanding of the 3D object dynamics, but also facilitates the learning of more robust priors for 3D shapes in dynamic scenes. In our experiments, Track3R demonstrates significant improvements in a joint point mapping and 3D motion estimation task for dynamic scenes, such as 25.8% improvements in the motion estimation, and 15.7% in the point mapping accuracy.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.86)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4DScene Generation

Neural Information Processing SystemsJun-23-2026, 02:29:27 GMT

We propose the first framework capable of computing a 4D spatio-temporal grid of video architecture.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Graphics (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Segment then Splat: Unified 3DOpen-Vocabulary Segmentation via Gaussian Splatting

Neural Information Processing SystemsJun-23-2026, 02:29:10 GMT

Open-vocabulary querying in 3D space is crucial for enabling more intelligent perception in applications such as robotics, autonomous systems, and augmented reality. However, most existing methods rely on 2D pixel-level parsing, leading to multi-view inconsistencies and poor 3D object retrieval. Moreover, they are limited to static scenes and struggle with dynamic scenes due to the complexities of motion modeling. In this paper, we propose Segment then Splat, a 3D-aware open vocabulary segmentation approach for both static and dynamic scenes based on Gaussian Splatting. Segment then Splat reverses the long-established approach of "segmentation after reconstruction" by dividing Gaussians into distinct object sets before reconstruction.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos

Neural Information Processing SystemsJun-22-2026, 23:22:31 GMT

Recent advancements in static feed-forward scene reconstruction have demonstrated significant progress in high-quality novel view synthesis. However, these models often struggle with generalizability across diverse environments and fail to effectively handle dynamic content.

artificial intelligence, deep learning, machine learning, (13 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Flux4D: Flow-based Unsupervised 4DReconstruction

Neural Information Processing SystemsJun-20-2026, 05:53:00 GMT

Reconstructing large-scale dynamic scenes from visual observations is a fundamental challenge in computer vision. While recent differentiable rendering methods such as NeRF and 3DGS have achieved impressive photorealistic reconstruction, they suffer from scalability limitations and require annotations to decouple moving actors from the static scene, such as in autonomous driving scenarios. Existing selfsupervised methods attempt to eliminate explicit annotations by leveraging motion cues and geometric priors, yet they remain constrained by per-scene optimization and sensitivity to hyperparameter tuning. In this paper, we introduce Flux4D, a simple and scalable framework for 4D reconstruction of large-scale dynamic driving scenes. Flux4D directly predicts 3DGaussians and their motion dynamics to reconstruct sensor observations in a fully unsupervised manner. By adopting only photometric losses and enforcing an "as static as possible" regularization, Flux4D learns to decompose dynamic elements directly from raw data without requiring pre-trained supervised models or foundational priors simply by training across many scenes. Our approach enables efficient reconstruction of dynamic scenes within seconds, scales effectively to large datasets, and generalizes well to unseen environments, including rare and unknown objects. Experiments on outdoor driving datasets show Flux4D significantly outperforms existing methods in scalability, generalization, and reconstruction quality.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.49)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

4DGCPro: Efficient Hierarchical 4DGaussian Compression for Progressive Volumetric Video Streaming

Neural Information Processing SystemsJun-19-2026, 20:17:22 GMT

Achieving seamless viewing of high-fidelity volumetric video, comparable to 2D video experiences, remains an open challenge. Existing volumetric video compression methods either lack the flexibility to adjust quality and bitrate within a single model for efficient streaming across diverse networks and devices, or struggle with real-time decoding and rendering on lightweight mobile platforms. To address these challenges, we introduce 4DGCPro, a novel hierarchical 4DGaussian compression framework that facilitates real-time mobile decoding and high-quality rendering via progressive volumetric video streaming in a single bitstream. Specifically, we propose a perceptually-weighted and compression-friendly hierarchical 4D Gaussian representation with motion-aware adaptive grouping to reduce temporal redundancy, preserve coherence, and enable scalable multi-level detail streaming. Furthermore, we present an end-to-end entropy-optimized training scheme, which incorporates layer-wise rate-distortion (RD) supervision and attribute-specific entropy modeling for efficient bitstream generation. Extensive experiments show that 4DGCPro enables flexible quality and multiple bitrate within a single model, achieving real-time decoding and rendering on mobile devices while outperforming existing methods in RD performance across multiple datasets. The corresponding author is Qiang Hu(qiang.hu@sjtu.edu.cn)

artificial intelligence, machine learning, real time system, (18 more...)

Neural Information Processing Systems

Country:

Asia (0.46)
North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.87)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Add feedback

Holistic Gaussian Splatting for Embodied View Synthesis

Neural Information Processing SystemsJun-19-2026, 13:29:30 GMT

We propose HoliGS, a novel deformable Gaussian splatting framework that addresses embodied view synthesis from long monocular RGB videos. Unlike prior 4DGaussian splatting and dynamic NeRF pipelines, which struggle with training overhead in minute-long captures, our method leverages invertible Gaussian Splatting deformation networks to reconstruct large-scale, dynamic environments accurately. Specifically, we decompose each scene into a static background plus time-varying objects, each represented by learned Gaussian primitives undergoing global rigid transformations, skeleton-driven articulation, and subtle non-rigid deformations via an invertible neural flow. This hierarchical warping strategy enables robust free-viewpoint novel-view rendering from various embodied camera trajectories by attaching Gaussians to a complete canonical foreground shape (e.g., egocentric or third-person follow), which may involve substantial viewpoint changes and interactions between multiple actors. Our experiments demonstrate that HoliGS achieves superior reconstruction quality on challenging datasets while significantly reducing both training and rendering time compared to state-of-the-art monocular deformable NeRFs.

artificial intelligence, machine learning, synthesis, (17 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Government (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Dynamic Gaussian Splatting from Defocused and Motion-blurred Monocular Videos

Neural Information Processing SystemsJun-18-2026, 00:03:24 GMT

This paper presents a unified framework that allows high-quality dynamic Gaussian Splatting from both defocused and motion-blurred monocular videos. Due to the significant difference between the formation processes of defocus blur and motion blur, existing methods are tailored for either one of them, lacking the ability to simultaneously deal with both of them. Although the two can be jointly modeled as blur kernel-based convolution, the inherent difficulty in estimating accurate blur kernels greatly limits the progress in this direction. In this work, we go a step further towards this direction. Particularly, we propose to estimate per-pixel reliable blur kernels using a blur prediction network that exploits blur-related scene and camera information and is subject to a blur-aware sparsity constraint. Besides, we introduce a dynamic Gaussian densification strategy to mitigate the lack of Gaussians for incomplete regions, and boost the performance of novel view synthesis by incorporating unseen view information to constrain scene optimization. Extensive experiments show that our method outperforms the state-of-the-art methods in generating photorealistic novel view synthesis from defocused and motion-blurred monocular videos.

artificial intelligence, deep learning, machine learning, (14 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Reconstruct, Inpaint, Test-Time Finetune: Dynamic Novel-view Synthesis from Monocular Videos

Neural Information Processing SystemsJun-16-2026, 21:18:20 GMT

We explore novel-view synthesis for dynamic scenes from monocular videos. Prior approaches rely on costly test-time optimization of 4D representations or do not preserve scene geometry when trained in a feed-forward manner. Our approach is based on three key insights: (1) covisible pixels (that are visible in both the input and target views) can be rendered by first reconstructing the dynamic 3D scene and rendering the reconstruction from the novel-views and (2) hidden pixels in novel views can be "inpainted" with feed-forward 2D video diffusion models. Notably, our video inpainting diffusion model (CogNVS) can be self-supervised from 2D videos, allowing us to train it on a large corpus of in-the-wild videos. This in turn allows for (3) CogNVS to be applied zero-shot to novel test videos via test-time finetuning. We empirically verify that CogNVS outperforms almost all prior art for novel-view synthesis of dynamic scenes from monocular videos.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback